Robotics 51
☆ 3P-LLM: Probabilistic Path Planning using Large Language Model for Autonomous Robot Navigation
Much worldly semantic knowledge can be encoded in large language models
(LLMs). Such information could be of great use to robots that want to carry out
high-level, temporally extended commands stated in natural language. However,
the lack of real-world experience that language models have is a key limitation
that makes it challenging to use them for decision-making inside a particular
embodiment. This research assesses the feasibility of using LLM (GPT-3.5-turbo
chatbot by OpenAI) for robotic path planning. The shortcomings of conventional
approaches to managing complex environments and developing trustworthy plans
for shifting environmental conditions serve as the driving force behind the
research. Due to the sophisticated natural language processing abilities of
LLM, the capacity to provide effective and adaptive path-planning algorithms in
real-time, great accuracy, and few-shot learning capabilities, GPT-3.5-turbo is
well suited for path planning in robotics. In numerous simulated scenarios, the
research compares the performance of GPT-3.5-turbo with that of
state-of-the-art path planners like Rapidly Exploring Random Tree (RRT) and A*.
We observed that GPT-3.5-turbo is able to provide real-time path planning
feedback to the robot and outperforms its counterparts. This paper establishes
the foundation for LLM-powered path planning for robotic systems.
comment: Exploratory Study
☆ CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning
Elliot Chane-Sane, Pierre-Alexandre Leziart, Thomas Flayols, Olivier Stasse, Philippe Souères, Nicolas Mansard
Deep Reinforcement Learning (RL) has demonstrated impressive results in
solving complex robotic tasks such as quadruped locomotion. Yet, current
solvers fail to produce efficient policies respecting hard constraints. In this
work, we advocate for integrating constraints into robot learning and present
Constraints as Terminations (CaT), a novel constrained RL algorithm. Departing
from classical constrained RL formulations, we reformulate constraints through
stochastic terminations during policy learning: any violation of a constraint
triggers a probability of terminating potential future rewards the RL agent
could attain. We propose an algorithmic approach to this formulation, by
minimally modifying widely used off-the-shelf RL algorithms in robot learning
(such as Proximal Policy Optimization). Our approach leads to excellent
constraint adherence without introducing undue complexity and computational
overhead, thus mitigating barriers to broader adoption. Through empirical
evaluation on the real quadruped robot Solo crossing challenging obstacles, we
demonstrate that CaT provides a compelling solution for incorporating
constraints into RL frameworks. Videos and code are available at
https://constraints-as-terminations.github.io.
comment: Project webpage: https://constraints-as-terminations.github.io
☆ Temporal Logic Formalisation of ISO 34502 Critical Scenarios: Modular Construction with the RSS Safety Distance
Jesse Reimann, Nico Mansion, James Haydon, Benjamin Bray, Agnishom Chattopadhyay, Sota Sato, Masaki Waga, Étienne André, Ichiro Hasuo, Naoki Ueda, Yosuke Yokoyama
As the development of autonomous vehicles progresses, efficient safety
assurance methods become increasingly necessary. Safety assurance methods such
as monitoring and scenario-based testing call for formalisation of driving
scenarios. In this paper, we develop a temporal-logic formalisation of an
important class of critical scenarios in the ISO standard 34502. We use signal
temporal logic (STL) as a logical formalism. Our formalisation has two main
features: 1) modular composition of logical formulas for systematic and
comprehensive formalisation (following the compositional methodology of ISO
34502); 2) use of the RSS distance for defining danger. We find our
formalisation comes with few parameters to tune thanks to the RSS distance. We
experimentally evaluated our formalisation; using its results, we discuss the
validity of our formalisation and its stability with respect to the choice of
some parameter values.
comment: 12 pages, 4 figures, 5 tables. Accepted to SAC 2024
☆ ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition
Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen
Place recognition is an important task for robots and autonomous cars to
localize themselves and close loops in pre-built maps. While single-modal
sensor-based methods have shown satisfactory performance, cross-modal place
recognition that retrieving images from a point-cloud database remains a
challenging problem. Current cross-modal methods transform images into 3D
points using depth estimation for modality conversion, which are usually
computationally intensive and need expensive labeled data for depth
supervision. In this work, we introduce a fast and lightweight framework to
encode images and point clouds into place-distinctive descriptors. We propose
an effective Field of View (FoV) transformation module to convert point clouds
into an analogous modality as images. This module eliminates the necessity for
depth estimation and helps subsequent modules achieve real-time performance. We
further design a non-negative factorization-based encoder to extract mutually
consistent semantic features between point clouds and images. This encoder
yields more distinctive global descriptors for retrieval. Experimental results
on the KITTI dataset show that our proposed methods achieve state-of-the-art
performance while running in real time. Additional evaluation on the HAOMO
dataset covering a 17 km trajectory further shows the practical generalization
capabilities. We have released the implementation of our methods as open source
at: https://github.com/haomo-ai/ModaLink.git.
comment: 8 pages, 11 figures, conference
☆ MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model
In the realm of data-driven AI technology, the application of open-source
large language models (LLMs) in robotic task planning represents a significant
milestone. Recent robotic task planning methods based on open-source LLMs
typically leverage vast task planning datasets to enhance models' planning
abilities. While these methods show promise, they struggle with complex
long-horizon tasks, which require comprehending more context and generating
longer action sequences. This paper addresses this limitation by proposing
MLDT, theMulti-Level Decomposition Task planning method. This method
innovatively decomposes tasks at the goal-level, task-level, and action-level
to mitigate the challenge of complex long-horizon tasks. In order to enhance
open-source LLMs' planning abilities, we introduce a goal-sensitive corpus
generation method to create high-quality training data and conduct instruction
tuning on the generated corpus. Since the complexity of the existing datasets
is not high enough, we construct a more challenging dataset, LongTasks, to
specifically evaluate planning ability on complex long-horizon tasks. We
evaluate our method using various LLMs on four datasets in VirtualHome. Our
results demonstrate a significant performance enhancement in robotic task
planning, showcasing MLDT's effectiveness in overcoming the limitations of
existing methods based on open-source LLMs as well as its practicality in
complex, real-world scenarios.
☆ PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations
Robot systems in education can leverage Large language models' (LLMs) natural
language understanding capabilities to provide assistance and facilitate
learning. This paper proposes a multimodal interactive robot (PhysicsAssistant)
built on YOLOv8 object detection, cameras, speech recognition, and chatbot
using LLM to provide assistance to students' physics labs. We conduct a user
study on ten 8th-grade students to empirically evaluate the performance of
PhysicsAssistant with a human expert. The Expert rates the assistants'
responses to student queries on a 0-4 scale based on Bloom's taxonomy to
provide educational support. We have compared the performance of
PhysicsAssistant (YOLOv8+GPT-3.5-turbo) with GPT-4 and found that the human
expert rating of both systems for factual understanding is the same. However,
the rating of GPT-4 for conceptual and procedural knowledge (3 and 3.2 vs 2.2
and 2.6, respectively) is significantly higher than PhysicsAssistant (p <
0.05). However, the response time of GPT-4 is significantly higher than
PhysicsAssistant (3.54 vs 1.64 sec, p < 0.05). Hence, despite the relatively
lower response quality of PhysicsAssistant than GPT-4, it has shown potential
for being used as a real-time lab assistant to provide timely responses and can
offload teachers' labor to assist with repetitive tasks. To the best of our
knowledge, this is the first attempt to build such an interactive multimodal
robotic assistant for K-12 science (physics) education.
comment: Submitted to IEEE RO-MAN
☆ An Efficient Risk-aware Branch MPC for Automated Driving that is Robust to Uncertain Vehicle Behaviors
One of the critical challenges in automated driving is ensuring safety of
automated vehicles despite the unknown behavior of the other vehicles. Although
motion prediction modules are able to generate a probability distribution
associated with various behavior modes, their probabilistic estimates are often
inaccurate, thus leading to a possibly unsafe trajectory. To overcome this
challenge, we propose a risk-aware motion planning framework that appropriately
accounts for the ambiguity in the estimated probability distribution. We
formulate the risk-aware motion planning problem as a min-max optimization
problem and develop an efficient iterative method by incorporating a
regularization term in the probability update step. Via extensive numerical
studies, we validate the convergence of our method and demonstrate its
advantages compared to the state-of-the-art approaches.
☆ Teaching Introductory HRI: UChicago Course "Human-Robot Interaction: Research and Practice"
In 2020, I designed the course CMSC 20630/30630 Human-Robot Interaction:
Research and Practice as a hands-on introduction to human-robot interaction
(HRI) research for both undergraduate and graduate students at the University
of Chicago. Since 2020, I have taught and refined this course each academic
year. Human-Robot Interaction: Research and Practice focuses on the core
concepts and cutting-edge research in the field of human-robot interaction
(HRI), covering topics that include: nonverbal robot behavior, verbal robot
behavior, social dynamics, norms & ethics, collaboration & learning, group
interactions, applications, and future challenges of HRI. Course meetings
involve students in the class leading discussions about cutting-edge
peer-reviewed research HRI publications. Students also participate in a
quarter-long collaborative research project, where they pursue an HRI research
question that often involves conducing their own human-subjects research study
where they recruit human subjects to interact with a robot. In this paper, I
detail the structure of the course and its learning goals as well as my
reflections and student feedback on the course.
comment: 4 pages, 2 tables, Presented at the Designing an Intro to HRI Course
Workshop at HRI 2024 (arXiv:2403.05588)
☆ Sampling-Based Motion Planning with Online Racing Line Generation for Autonomous Driving on Three-Dimensional Race Tracks
Existing approaches to trajectory planning for autonomous racing employ
sampling-based methods, generating numerous jerk-optimal trajectories and
selecting the most favorable feasible trajectory based on a cost function
penalizing deviations from an offline-calculated racing line. While successful
on oval tracks, these methods face limitations on complex circuits due to the
simplistic geometry of jerk-optimal edges failing to capture the complexity of
the racing line. Additionally, they only consider two-dimensional tracks,
potentially neglecting or surpassing the actual dynamic potential. In this
paper, we present a sampling-based local trajectory planning approach for
autonomous racing that can maintain the lap time of the racing line even on
complex race tracks and consider the race track's three-dimensional effects. In
simulative experiments, we demonstrate that our approach achieves lower lap
times and improved utilization of dynamic limits compared to existing
approaches. We also investigate the impact of online racing line generation, in
which the time-optimal solution is planned from the current vehicle state for a
limited spatial horizon, in contrast to a closed racing line calculated
offline. We show that combining the sampling-based planner with the online
racing line generation can significantly reduce lap times in multi-vehicle
scenarios.
comment: 8 pages, submitted to be published at the 35th IEEE Intelligent
Vehicles Symposium, June 2 - 5, 2024, Jeju Shinhwa World, Jeju Island, Korea
☆ Will You Participate? Exploring the Potential of Robotics Competitions on Human-centric Topics
This paper presents findings from an exploratory needfinding study
investigating the research current status and potential participation of the
competitions on the robotics community towards four human-centric topics:
safety, privacy, explainability, and federated learning. We conducted a survey
with 34 participants across three distinguished European robotics consortia,
nearly 60% of whom possessed over five years of research experience in
robotics. Our qualitative and quantitative analysis revealed that current
mainstream robotic researchers prioritize safety and explainability, expressing
a greater willingness to invest in further research in these areas. Conversely,
our results indicate that privacy and federated learning garner less attention
and are perceived to have lower potential. Additionally, the study suggests a
lack of enthusiasm within the robotics community for participating in
competitions related to these topics. Based on these findings, we recommend
targeting other communities, such as the machine learning community, for future
competitions related to these four human-centric topics.
☆ RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
Procedure Planning in instructional videos entails generating a sequence of
action steps based on visual observations of the initial and target states.
Despite the rapid progress in this task, there remain several critical
challenges to be solved: (1) Adaptive procedures: Prior works hold an
unrealistic assumption that the number of action steps is known and fixed,
leading to non-generalizable models in real-world scenarios where the sequence
length varies. (2) Temporal relation: Understanding the step temporal relation
knowledge is essential in producing reasonable and executable plans. (3)
Annotation cost: Annotating instructional videos with step-level labels (i.e.,
timestamp) or sequence-level labels (i.e., action category) is demanding and
labor-intensive, limiting its generalizability to large-scale datasets.In this
work, we propose a new and practical setting, called adaptive procedure
planning in instructional videos, where the procedure length is not fixed or
pre-determined. To address these challenges we introduce Retrieval-Augmented
Planner (RAP) model. Specifically, for adaptive procedures, RAP adaptively
determines the conclusion of actions using an auto-regressive model
architecture. For temporal relation, RAP establishes an external memory module
to explicitly retrieve the most relevant state-action pairs from the training
videos and revises the generated procedures. To tackle high annotation cost,
RAP utilizes a weakly-supervised learning manner to expand the training dataset
to other task-relevant, unannotated videos by generating pseudo labels for
action steps. Experiments on CrossTask and COIN benchmarks show the superiority
of RAP over traditional fixed-length models, establishing it as a strong
baseline solution for adaptive procedure planning.
comment: 23 pages, 6 figures, 12 tables
☆ Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes
Fast and robust object grasping in clutter is a crucial component of
robotics. Most current works resort to the whole observed point cloud for 6-Dof
grasp generation, ignoring the guidance information excavated from global
semantics, thus limiting high-quality grasp generation and real-time
performance. In this work, we show that the widely used heatmaps are
underestimated in the efficiency of 6-Dof grasp generation. Therefore, we
propose an effective local grasp generator combined with grasp heatmaps as
guidance, which infers in a global-to-local semantic-to-point way.
Specifically, Gaussian encoding and the grid-based strategy are applied to
predict grasp heatmaps as guidance to aggregate local points into graspable
regions and provide global semantic information. Further, a novel non-uniform
anchor sampling mechanism is designed to improve grasp accuracy and diversity.
Benefiting from the high-efficiency encoding in the image space and focusing on
points in local graspable regions, our framework can perform high-quality grasp
detection in real-time and achieve state-of-the-art results. In addition, real
robot experiments demonstrate the effectiveness of our method with a success
rate of 94% and a clutter completion rate of 100%. Our code is available at
https://github.com/THU-VCLab/HGGD.
comment: Extensive results on GraspNet-1B dataset
☆ Bridging the Gap: Regularized Reinforcement Learning for Improved Classical Motion Planning with Safety Modules
Classical navigation planners can provide safe navigation, albeit often
suboptimally and with hindered human norm compliance. ML-based, contemporary
autonomous navigation algorithms can imitate more natural and humancompliant
navigation, but usually require large and realistic datasets and do not always
provide safety guarantees. We present an approach that leverages a classical
algorithm to guide reinforcement learning. This greatly improves the results
and convergence rate of the underlying RL algorithm and requires no
human-expert demonstrations to jump-start the process. Additionally, we
incorporate a practical fallback system that can switch back to a classical
planner to ensure safety. The outcome is a sample efficient ML approach for
mobile navigation that builds on classical algorithms, improves them to ensure
human compliance, and guarantees safety.
comment: 8 pages
☆ CoBOS: Constraint-Based Online Scheduler for Human-Robot Collaboration
Assembly processes involving humans and robots are challenging scenarios
because the individual activities and access to shared workspace have to be
coordinated. Fixed robot programs leave no room to diverge from a fixed
protocol. Working on such a process can be stressful for the user and lead to
ineffective behavior or failure. We propose a novel approach of online
constraint-based scheduling in a reactive execution control framework
facilitating behavior trees called CoBOS. This allows the robot to adapt to
uncertain events such as delayed activity completions and activity selection
(by the human). The user will experience less stress as the robotic coworkers
adapt their behavior to best complement the human-selected activities to
complete the common task. In addition to the improved working conditions, our
algorithm leads to increased efficiency, even in highly uncertain scenarios. We
evaluate our algorithm using a probabilistic simulation study with 56000
experiments. We outperform all baselines by a margin of 4-10%. Initial real
robot experiments using a Franka Emika Panda robot and human tracking based on
HTC Vive VR gloves look promising.
comment: 7 pages, 8 figures
☆ Inverse kinematics learning of a continuum manipulator using limited real time data
Data driven control of a continuum manipulator requires a lot of data for
training but generating sufficient amount of real time data is not cost
efficient. Random actuation of the manipulator can also be unsafe sometimes.
Meta learning has been used successfully to adapt to a new environment. Hence,
this paper tries to solve the above mentioned problem using meta learning. We
consider two cases for that. First, this paper proposes a method to use
simulation data for training the model using MAML(Model-Agnostic
Meta-Learning). Then, it adapts to the real world using gradient steps.
Secondly,if the simulation model is not available or difficult to formulate,
then we propose a CGAN(Conditional Generative adversial network)-MAML based
method for it. The model is trained using a small amount of real time data and
augmented data for different loading conditions. Then, adaptation is done in
the real environment. It has been found out from the experiments that the
relative positioning error for both the cases are below 3%. The proposed models
are experimentally verified on a real continuum manipulator.
☆ SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model CVPR 2024
There are five types of trajectory prediction tasks: deterministic,
stochastic, domain adaptation, momentary observation, and few-shot. These
associated tasks are defined by various factors, such as the length of input
paths, data split and pre-processing methods. Interestingly, even though they
commonly take sequential coordinates of observations as input and infer future
paths in the same coordinates as output, designing specialized architectures
for each task is still necessary. For the other task, generality issues can
lead to sub-optimal performances. In this paper, we propose SingularTrajectory,
a diffusion-based universal trajectory prediction framework to reduce the
performance gap across the five tasks. The core of SingularTrajectory is to
unify a variety of human dynamics representations on the associated tasks. To
do this, we first build a Singular space to project all types of motion
patterns from each task into one embedding space. We next propose an adaptive
anchor working in the Singular space. Unlike traditional fixed anchor methods
that sometimes yield unacceptable paths, our adaptive anchor enables correct
anchors, which are put into a wrong location, based on a traversability map.
Finally, we adopt a diffusion-based predictor to further enhance the prototype
paths using a cascaded denoising process. Our unified framework ensures the
generality across various benchmark settings such as input modality, and
trajectory lengths. Extensive experiments on five public benchmarks demonstrate
that SingularTrajectory substantially outperforms existing models, highlighting
its effectiveness in estimating general dynamics of human movements. Code is
publicly available at https://github.com/inhwanbae/SingularTrajectory .
comment: Accepted at CVPR 2024
☆ Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction CVPR 2024
Language models have demonstrated impressive ability in context understanding
and generative performance. Inspired by the recent success of language
foundation models, in this paper, we propose LMTraj (Language-based Multimodal
Trajectory predictor), which recasts the trajectory prediction task into a sort
of question-answering problem. Departing from traditional numerical regression
models, which treat the trajectory coordinate sequence as continuous signals,
we consider them as discrete signals like text prompts. Specially, we first
transform an input space for the trajectory coordinate into the natural
language space. Here, the entire time-series trajectories of pedestrians are
converted into a text prompt, and scene images are described as text
information through image captioning. The transformed numerical and image data
are then wrapped into the question-answering template for use in a language
model. Next, to guide the language model in understanding and reasoning
high-level knowledge, such as scene context and social relationships between
pedestrians, we introduce an auxiliary multi-task question and answering. We
then train a numerical tokenizer with the prompt data. We encourage the
tokenizer to separate the integer and decimal parts well, and leverage it to
capture correlations between the consecutive numbers in the language model.
Lastly, we train the language model using the numerical tokenizer and all of
the question-answer prompts. Here, we propose a beam-search-based most-likely
prediction and a temperature-based multimodal prediction to implement both
deterministic and stochastic inferences. Applying our LMTraj, we show that the
language-based model can be a powerful pedestrian trajectory predictor, and
outperforms existing numerical-based predictor methods. Code is publicly
available at https://github.com/inhwanbae/LMTrajectory .
comment: Accepted at CVPR 2024
☆ HyRRT-Connect: A Bidirectional Rapidly-Exploring Random Trees Motion Planning Algorithm for Hybrid Systems
This paper proposes a bidirectional rapidly-exploring random trees (RRT)
algorithm to solve the motion planning problem for hybrid systems. The proposed
algorithm, called HyRRT-Connect, propagates in both forward and backward
directions in hybrid time until an overlap between the forward and backward
propagation results is detected. Then, HyRRT-Connect constructs a motion plan
through the reversal and concatenation of functions defined on hybrid time
domains, ensuring the motion plan thoroughly satisfies the given hybrid
dynamics. To address the potential discontinuity along the flow caused by
tolerating some distance between the forward and backward partial motion plans,
we reconstruct the backward partial motion plan by a forward-in-hybrid-time
simulation from the final state of the forward partial motion plan. By applying
the reversed input of the backward partial motion plan, the reconstruction
process effectively eliminates the discontinuity and ensures that as the
tolerance distance decreases to zero, the distance between the endpoint of the
reconstructed motion plan and the final state set approaches zero. The proposed
algorithm is applied to an actuated bouncing ball example and a walking robot
example so as to highlight its generality and computational improvement.
comment: Accepted by the 8th IFAC International Conference on Analysis and
Design of Hybrid Systems (ADHS 2024)
☆ Extensible Hook System for Rendesvouz and Docking of a Cubesat Swarm
The use of cubesat swarms is being proposed for different missions where
cooperation between satellites is required. Commonly, the cube swarm requires
formation flight and even rendezvous and docking, which are very challenging
tasks since they required more energy and the use of advanced guidance,
navigation and control techniques. In this paper, we propose the use of an
extensible hook system to mitigate these drawbacks,i.e. it allows to save fuel
and reduce the system complexity by including techniques that have been
previously demonstrated on Earth. This system is based on a scissor boom
structure, which could reach up to five meters for a 4U dimension, including
three degrees of freedom to place the end effector at any pose within the
system workspace. We simulated the dynamic behaviour of a cubesat with the
proposed system, demonstrating the required power for a 16U cubesat equipped
with one extensible hook system is considered acceptable according to the
current state of the art actuators.
☆ Imaging radar and LiDAR image translation for 3-DOF extrinsic calibration
The integration of sensor data is crucial in the field of robotics to take
full advantage of the various sensors employed. One critical aspect of this
integration is determining the extrinsic calibration parameters, such as the
relative transformation, between each sensor. The use of data fusion between
complementary sensors, such as radar and LiDAR, can provide significant
benefits, particularly in harsh environments where accurate depth data is
required. However, noise included in radar sensor data can make the estimation
of extrinsic calibration challenging. To address this issue, we present a novel
framework for the extrinsic calibration of radar and LiDAR sensors, utilizing
CycleGAN as amethod of image-to-image translation. Our proposed method employs
translating radar bird-eye-view images into LiDAR-style images to estimate the
3-DOF extrinsic parameters. The use of image registration techniques, as well
as deskewing based on sensor odometry and B-spline interpolation, is employed
to address the rolling shutter effect commonly present in spinning sensors. Our
method demonstrates a notable improvement in extrinsic calibration compared to
filter-based methods using the MulRan dataset.
☆ RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation ICRA 2024
Estimating robot pose and joint angles is significant in advanced robotics,
enabling applications like robot collaboration and online hand-eye
calibration.However, the introduction of unknown joint angles makes prediction
more complex than simple robot pose estimation, due to its higher
dimensionality.Previous methods either regress 3D keypoints directly or utilise
a render&compare strategy. These approaches often falter in terms of
performance or efficiency and grapple with the cross-camera gap problem.This
paper presents a novel framework that bifurcates the high-dimensional
prediction task into two manageable subtasks: 2D keypoints detection and
lifting 2D keypoints to 3D. This separation promises enhanced performance
without sacrificing the efficiency innate to keypoint-based techniques.A vital
component of our method is the lifting of 2D keypoints to 3D keypoints. Common
deterministic regression methods may falter when faced with uncertainties from
2D detection errors or self-occlusions.Leveraging the robust modeling potential
of diffusion models, we reframe this issue as a conditional 3D keypoints
generation task. To bolster cross-camera adaptability, we introduce
theNormalised Camera Coordinate Space (NCCS), ensuring alignment of estimated
2D keypoints across varying camera intrinsics.Experimental results demonstrate
that the proposed method outperforms the state-of-the-art render\&compare
method and achieves higher inference speed.Furthermore, the tests accentuate
our method's robust cross-camera generalisation capabilities.We intend to
release both the dataset and code in https://nimolty.github.io/Robokeygen/
comment: Accepted by ICRA 2024
☆ Manipulating Neural Path Planners via Slight Perturbations
Data-driven neural path planners are attracting increasing interest in the
robotics community. However, their neural network components typically come as
black boxes, obscuring their underlying decision-making processes. Their
black-box nature exposes them to the risk of being compromised via the
insertion of hidden malicious behaviors. For example, an attacker may hide
behaviors that, when triggered, hijack a delivery robot by guiding it to a
specific (albeit wrong) destination, trapping it in a predefined region, or
inducing unnecessary energy expenditure by causing the robot to repeatedly
circle a region. In this paper, we propose a novel approach to specify and
inject a range of hidden malicious behaviors, known as backdoors, into neural
path planners. Our approach provides a concise but flexible way to define these
behaviors, and we show that hidden behaviors can be triggered by slight
perturbations (e.g., inserting a tiny unnoticeable object), that can
nonetheless significantly compromise their integrity. We also discuss potential
techniques to identify these backdoors aimed at alleviating such risks. We
demonstrate our approach on both sampling-based and search-based neural path
planners.
☆ Multi-AGV Path Planning Method via Reinforcement Learning and Particle Filters
The Reinforcement Learning (RL) algorithm, renowned for its robust learning
capability and search stability, has garnered significant attention and found
extensive application in Automated Guided Vehicle (AGV) path planning. However,
RL planning algorithms encounter challenges stemming from the substantial
variance of neural networks caused by environmental instability and significant
fluctuations in system structure. These challenges manifest in slow convergence
speed and low learning efficiency. To tackle this issue, this paper presents
the Particle Filter-Double Deep Q-Network (PF-DDQN) approach, which
incorporates the Particle Filter (PF) into multi-AGV reinforcement learning
path planning. The PF-DDQN method leverages the imprecise weight values of the
network as state values to formulate the state space equation. Through the
iterative fusion process of neural networks and particle filters, the DDQN
model is optimized to acquire the optimal true weight values, thus enhancing
the algorithm's efficiency. The proposed method's effectiveness and superiority
are validated through numerical simulations. Overall, the simulation results
demonstrate that the proposed algorithm surpasses the traditional DDQN
algorithm in terms of path planning superiority and training time indicators by
92.62% and 76.88%, respectively. In conclusion, the PF-DDQN method addresses
the challenges encountered by RL planning algorithms in AGV path planning. By
integrating the Particle Filter and optimizing the DDQN model, the proposed
method achieves enhanced efficiency and outperforms the traditional DDQN
algorithm in terms of path planning superiority and training time indicators.
☆ Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies
Large-scale robotic policies trained on data from diverse tasks and robotic
platforms hold great promise for enabling general-purpose robots; however,
reliable generalization to new environment conditions remains a major
challenge. Toward addressing this challenge, we propose a novel approach for
uncertainty-aware deployment of pre-trained language-conditioned imitation
learning agents. Specifically, we use temperature scaling to calibrate these
models and exploit the calibrated model to make uncertainty-aware decisions by
aggregating the local information of candidate actions. We implement our
approach in simulation using three such pre-trained models, and showcase its
potential to significantly enhance task completion rates. The accompanying code
is accessible at the link:
https://github.com/BobWu1998/uncertainty_quant_all.git
comment: 8 pages, 7 figures
☆ Preference-Based Planning in Stochastic Environments: From Partially-Ordered Temporal Goals to Most Preferred Policies
Human preferences are not always represented via complete linear orders: It
is natural to employ partially-ordered preferences for expressing incomparable
outcomes. In this work, we consider decision-making and probabilistic planning
in stochastic systems modeled as Markov decision processes (MDPs), given a
partially ordered preference over a set of temporally extended goals.
Specifically, each temporally extended goal is expressed using a formula in
Linear Temporal Logic on Finite Traces (LTL$_f$). To plan with the partially
ordered preference, we introduce order theory to map a preference over temporal
goals to a preference over policies for the MDP. Accordingly, a most preferred
policy under a stochastic ordering induces a stochastic nondominated
probability distribution over the finite paths in the MDP. To synthesize a most
preferred policy, our technical approach includes two key steps. In the first
step, we develop a procedure to transform a partially ordered preference over
temporal goals into a computational model, called preference automaton, which
is a semi-automaton with a partial order over acceptance conditions. In the
second step, we prove that finding a most preferred policy is equivalent to
computing a Pareto-optimal policy in a multi-objective MDP that is constructed
from the original MDP, the preference automaton, and the chosen stochastic
ordering relation. Throughout the paper, we employ running examples to
illustrate the proposed preference specification and solution approaches. We
demonstrate the efficacy of our algorithm using these examples, providing
detailed analysis, and then discuss several potential future directions.
comment: arXiv admin note: substantial text overlap with arXiv:2209.12267
☆ Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous Driving
Reinforcement learning (RL) has been widely used in decision-making tasks,
but it cannot guarantee the agent's safety in the training process due to the
requirements of interaction with the environment, which seriously limits its
industrial applications such as autonomous driving. Safe RL methods are
developed to handle this issue by constraining the expected safety violation
costs as a training objective, but they still permit unsafe state occurrence,
which is unacceptable in autonomous driving tasks. Moreover, these methods are
difficult to achieve a balance between the cost and return expectations, which
leads to learning performance degradation for the algorithms. In this paper, we
propose a novel algorithm based on the long and short-term constraints (LSTC)
for safe RL. The short-term constraint aims to guarantee the short-term state
safety that the vehicle explores, while the long-term constraint ensures the
overall safety of the vehicle throughout the decision-making process. In
addition, we develop a safe RL method with dual-constraint optimization based
on the Lagrange multiplier to optimize the training process for end-to-end
autonomous driving. Comprehensive experiments were conducted on the MetaDrive
simulator. Experimental results demonstrate that the proposed method achieves
higher safety in continuous state and action tasks, and exhibits higher
exploration performance in long-distance decision-making tasks compared with
state-of-the-art methods.
☆ Road Obstacle Detection based on Unknown Objectness Scores ICRA 2024
The detection of unknown traffic obstacles is vital to ensure safe autonomous
driving. The standard object-detection methods cannot identify unknown objects
that are not included under predefined categories. This is because
object-detection methods are trained to assign a background label to pixels
corresponding to the presence of unknown objects. To address this problem, the
pixel-wise anomaly-detection approach has attracted increased research
attention. Anomaly-detection techniques, such as uncertainty estimation and
perceptual difference from reconstructed images, make it possible to identify
pixels of unknown objects as out-of-distribution (OoD) samples. However, when
applied to images with many unknowns and complex components, such as driving
scenes, these methods often exhibit unstable performance. The purpose of this
study is to achieve stable performance for detecting unknown objects by
incorporating the object-detection fashions into the pixel-wise anomaly
detection methods. To achieve this goal, we adopt a semantic-segmentation
network with a sigmoid head that simultaneously provides pixel-wise anomaly
scores and objectness scores. Our experimental results show that the objectness
scores play an important role in improving the detection performance. Based on
these results, we propose a novel anomaly score by integrating these two
scores, which we term as unknown objectness score. Quantitative evaluations
show that the proposed method outperforms state-of-the-art methods when applied
to the publicly available datasets.
comment: ICRA 2024
☆ Sailing Through Point Clouds: Safe Navigation Using Point Cloud Based Control Barrier Functions
The capability to navigate safely in an unstructured environment is crucial
when deploying robotic systems in real-world scenarios. Recently, control
barrier function (CBF) based approaches have been highly effective in
synthesizing safety-critical controllers. In this work, we propose a novel
CBF-based local planner comprised of two components: Vessel and Mariner. The
Vessel is a novel scaling factor based CBF formulation that synthesizes CBFs
using only point cloud data. The Mariner is a CBF-based preview control
framework that is used to mitigate getting stuck in spurious equilibria during
navigation. To demonstrate the efficacy of our proposed approach, we first
compare the proposed point cloud based CBF formulation with other point cloud
based CBF formulations. Then, we demonstrate the performance of our proposed
approach and its integration with global planners using experimental studies on
the Unitree B1 and Unitree Go2 quadruped robots in various environments.
☆ LocoMan: Advancing Versatile Quadrupedal Dexterity with Lightweight Loco-Manipulators
Changyi Lin, Xingyu Liu, Yuxiang Yang, Yaru Niu, Wenhao Yu, Tingnan Zhang, Jie Tan, Byron Boots, Ding Zhao
Quadrupedal robots have emerged as versatile agents capable of locomoting and
manipulating in complex environments. Traditional designs typically rely on the
robot's inherent body parts or incorporate top-mounted arms for manipulation
tasks. However, these configurations may limit the robot's operational
dexterity, efficiency and adaptability, particularly in cluttered or
constrained spaces. In this work, we present LocoMan, a dexterous quadrupedal
robot with a novel morphology to perform versatile manipulation in diverse
constrained environments. By equipping a Unitree Go1 robot with two low-cost
and lightweight modular 3-DoF loco-manipulators on its front calves, LocoMan
leverages the combined mobility and functionality of the legs and grippers for
complex manipulation tasks that require precise 6D positioning of the end
effector in a wide workspace. To harness the loco-manipulation capabilities of
LocoMan, we introduce a unified control framework that extends the whole-body
controller (WBC) to integrate the dynamics of loco-manipulators. Through
experiments, we validate that the proposed whole-body controller can accurately
and stably follow desired 6D trajectories of the end effector and torso, which,
when combined with the large workspace from our design, facilitates a diverse
set of challenging dexterous loco-manipulation tasks in confined spaces, such
as opening doors, plugging into sockets, picking objects in narrow and
low-lying spaces, and bimanual manipulation.
comment: Project page: https://linchangyi1.github.io/LocoMan
☆ SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly Network
Autonomous assembly in robotics and 3D vision presents significant
challenges, particularly in ensuring assembly correctness. Presently,
predominant methods such as MEPNet focus on assembling components based on
manually provided images. However, these approaches often fall short in
achieving satisfactory results for tasks requiring long-term planning.
Concurrently, we observe that integrating a self-correction module can
partially alleviate such issues. Motivated by this concern, we introduce the
single-step assembly error correction task, which involves identifying and
rectifying misassembled components. To support research in this area, we
present the LEGO Error Correction Assembly Dataset (LEGO-ECA), comprising
manual images for assembly steps and instances of assembly failures.
Additionally, we propose the Self-Correct Assembly Network (SCANet), a novel
method to address this task. SCANet treats assembled components as queries,
determining their correctness in manual images and providing corrections when
necessary. Finally, we utilize SCANet to correct the assembly results of
MEPNet. Experimental results demonstrate that SCANet can identify and correct
MEPNet's misassembled results, significantly improving the correctness of
assembly. Our code and dataset are available at
https://github.com/Yaser-wyx/SCANet.
☆ Online Embedding Multi-Scale CLIP Features into 3D Maps
This study introduces a novel approach to online embedding of multi-scale
CLIP (Contrastive Language-Image Pre-Training) features into 3D maps. By
harnessing CLIP, this methodology surpasses the constraints of conventional
vocabulary-limited methods and enables the incorporation of semantic
information into the resultant maps. While recent approaches have explored the
embedding of multi-modal features in maps, they often impose significant
computational costs, lacking practicality for exploring unfamiliar environments
in real time. Our approach tackles these challenges by efficiently computing
and embedding multi-scale CLIP features, thereby facilitating the exploration
of unfamiliar environments through real-time map generation. Moreover, the
embedding CLIP features into the resultant maps makes offline retrieval via
linguistic queries feasible. In essence, our approach simultaneously achieves
real-time object search and mapping of unfamiliar environments. Additionally,
we propose a zero-shot object-goal navigation system based on our mapping
approach, and we validate its efficacy through object-goal navigation, offline
object retrieval, and multi-object-goal navigation in both simulated
environments and real robot experiments. The findings demonstrate that our
method not only exhibits swifter performance than state-of-the-art mapping
methods but also surpasses them in terms of the success rate of object-goal
navigation tasks.
comment: 8 pages, 7 figures
☆ Vision-Based Force Estimation for Minimally Invasive Telesurgery Through Contact Detection and Local Stiffness Models
In minimally invasive telesurgery, obtaining accurate force information is
difficult due to the complexities of in-vivo end effector force sensing. This
constrains development and implementation of haptic feedback and force-based
automated performance metrics, respectively. Vision-based force sensing
approaches using deep learning are a promising alternative to intrinsic end
effector force sensing. However, they have limited ability to generalize to
novel scenarios, and require learning on high-quality force sensor training
data that can be difficult to obtain. To address these challenges, this paper
presents a novel vision-based contact-conditional approach for force estimation
in telesurgical environments. Our method leverages supervised learning with
human labels and end effector position data to train deep neural networks.
Predictions from these trained models are optionally combined with robot joint
torque information to estimate forces indirectly from visual data. We benchmark
our method against ground truth force sensor data and demonstrate generality by
fine-tuning to novel surgical scenarios in a data-efficient manner. Our methods
demonstrated greater than 90% accuracy on contact detection and less than 10%
force prediction error. These results suggest potential usefulness of
contact-conditional force estimation for sensory substitution haptic feedback
and tissue handling skill evaluation in clinical settings.
comment: Preprint of an article accepted in Journal of Medical Robotics
Research \copyright 2024 copyright World Scientific Publishing Company
♻ ☆ SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields CVPR 2024
Quentin Herau, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric Demonceaux
In rapidly-evolving domains such as autonomous driving, the use of multiple
sensors with different modalities is crucial to ensure high operational
precision and stability. To correctly exploit the provided information by each
sensor in a single common frame, it is essential for these sensors to be
accurately calibrated. In this paper, we leverage the ability of Neural
Radiance Fields (NeRF) to represent different sensors modalities in a common
volumetric representation to achieve robust and accurate spatio-temporal sensor
calibration. By designing a partitioning approach based on the visible part of
the scene for each sensor, we formulate the calibration problem using only the
overlapping areas. This strategy results in a more robust and accurate
calibration that is less prone to failure. We demonstrate that our approach
works on outdoor urban scenes by validating it on multiple established driving
datasets. Results show that our method is able to get better accuracy and
robustness compared to existing methods.
comment: Accepted at CVPR 2024. Project page: https://qherau.github.io/SOAC/
♻ ☆ Sim-to-Real gap in RL: Use Case with TIAGo and Isaac Sim/Gym
This paper explores policy-learning approaches in the context of sim-to-real
transfer for robotic manipulation using a TIAGo mobile manipulator, focusing on
two state-of-art simulators, Isaac Gym and Isaac Sim, both developed by Nvidia.
Control architectures are discussed, with a particular emphasis on achieving
collision-less movement in both simulation and the real environment. Presented
results demonstrate successful sim-to-real transfer, showcasing similar
movements executed by an RL-trained model in both simulated and real setups.
comment: Accepted in ERF24 workshop "Towards Efficient and Portable Robot
Learning for Real-World Settings". To be published in Springer Proceedings in
Advanced Robotics
♻ ☆ Modeling and Control of Intrinsically Elasticity Coupled Soft-Rigid Robots
While much work has been done recently in the realm of model-based control of
soft robots and soft-rigid hybrids, most works examine robots that have an
inherently serial structure. While these systems have been prevalent in the
literature, there is an increasing trend toward designing soft-rigid hybrids
with intrinsically coupled elasticity between various degrees of freedom. In
this work, we seek to address the issues of modeling and controlling such
structures, particularly when underactuated. We introduce several simple models
for elastic coupling, typical of those seen in these systems. We then propose a
controller that compensates for the elasticity, and we prove its stability with
Lyapunov methods without relying on the elastic dominance assumption. This
controller is applicable to the general class of underactuated soft robots.
After evaluating the controller in simulated cases, we then develop a simple
hardware platform to evaluate both the models and the controller. Finally,
using the hardware, we demonstrate a novel use case for underactuated,
elastically coupled systems in "sensorless" force control.
comment: 7 pages, 8 figures
♻ ☆ Safe Control for Soft-Rigid Robots with Self-Contact using Control Barrier Functions
Incorporating both flexible and rigid components in robot designs offers a
unique solution to the limitations of traditional rigid robotics by enabling
both compliance and strength. This paper explores the challenges and solutions
for controlling soft-rigid hybrid robots, particularly addressing the issue of
self-contact. Conventional control methods prioritize precise state tracking,
inadvertently increasing the system's overall stiffness, which is not always
desirable in interactions with the environment or within the robot itself. To
address this, we investigate the application of Control Barrier Functions
(CBFs) and High Order CBFs to manage self-contact scenarios in serially
connected soft-rigid hybrid robots. Through an analysis based on Piecewise
Constant Curvature (PCC) kinematics, we establish CBFs within a classical
control framework for self-contact dynamics. Our methodology is rigorously
evaluated in both simulation environments and physical hardware systems. The
findings demonstrate that our proposed control strategy effectively regulates
self-contact in soft-rigid hybrid robotic systems, marking a significant
advancement in the field of robotics.
comment: 6 pages, 6 figures, submitted to IEEE Robosoft 2024 Conference
♻ ☆ DRIVE: Data-driven Robot Input Vector Exploration ICRA2024
Dominic Baril, Simon-Pierre Deschênes, Luc Coupal, Cyril Goffin, Julien Lépine, Philippe Giguère, François Pomerleau
An accurate motion model is a fundamental component of most autonomous
navigation systems. While much work has been done on improving model
formulation, no standard protocol exists for gathering empirical data required
to train models. In this work, we address this issue by proposing Data-driven
Robot Input Vector Exploration (DRIVE), a protocol that enables characterizing
uncrewed ground vehicles (UGVs) input limits and gathering empirical model
training data. We also propose a novel learned slip approach outperforming
similar acceleration learning approaches. Our contributions are validated
through an extensive experimental evaluation, cumulating over 7 km and 1.8 h of
driving data over three distinct UGVs and four terrain types. We show that our
protocol offers increased predictive performance over common human-driven
data-gathering protocols. Furthermore, our protocol converges with 46 s of
training data, almost four times less than the shortest human dataset gathering
protocol. We show that the operational limit for our model is reached in
extreme slip conditions encountered on surfaced ice. DRIVE is an efficient way
of characterizing UGV motion in its operational conditions. Our code and
dataset are both available online at this link:
https://github.com/norlab-ulaval/DRIVE.
comment: 8 pages, 7 figures, 1 table, accepted for publication at the 2024
IEEE International Conference on Robotics and Automation (ICRA2024),
Yokohama, Japan
♻ ☆ Frontier-Based Exploration for Multi-Robot Rendezvous in Communication-Restricted Unknown Environments
Multi-robot rendezvous and exploration are fundamental challenges in the
domain of mobile robotic systems. This paper addresses multi-robot rendezvous
within an initially unknown environment where communication is only possible
after the rendezvous. Traditionally, exploration has been focused on rapidly
mapping the environment, often leading to suboptimal rendezvous performance in
later stages. We adapt a standard frontier-based exploration technique to
integrate exploration and rendezvous into a unified strategy, with a mechanism
that allows robots to re-visit previously explored regions thus enhancing
rendezvous opportunities. We validate our approach in 3D realistic simulations
using ROS, showcasing its effectiveness in achieving faster rendezvous times
compared to exploration strategies.
♻ ☆ Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrain
Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki Ogura
Navigating multi-robot systems in complex terrains has always been a
challenging task. This is due to the inherent limitations of traditional robots
in collision avoidance, adaptation to unknown environments, and sustained
energy efficiency. In order to overcome these limitations, this research
proposes a solution by integrating living insects with miniature electronic
controllers to enable robotic-like programmable control, and proposing a novel
control algorithm for swarming. Although these creatures, called cyborg
insects, have the ability to instinctively avoid collisions with neighbors and
obstacles while adapting to complex terrains, there is a lack of literature on
the control of multi-cyborg systems. This research gap is due to the difficulty
in coordinating the movements of a cyborg system under the presence of insects'
inherent individual variability in their reactions to control input. In
response to this issue, we propose a novel swarm navigation algorithm
addressing these challenges. The effectiveness of the algorithm is demonstrated
through an experimental validation in which a cyborg swarm was successfully
navigated through an unknown sandy field with obstacles and hills. This
research contributes to the domain of swarm robotics and showcases the
potential of integrating biological organisms with robotics and control theory
to create more intelligent autonomous systems with real-world applications.
♻ ☆ Polygonal Cone Control Barrier Functions (PolyC2BF) for safe navigation in cluttered environments
In fields such as mining, search and rescue, and archaeological exploration,
ensuring real-time, collision-free navigation of robots in confined, cluttered
environments is imperative. Despite the value of established path planning
algorithms, they often face challenges in convergence rates and handling
dynamic infeasibilities. Alternative techniques like collision cones struggle
to accurately represent complex obstacle geometries. This paper introduces a
novel category of control barrier functions, known as Polygonal Cone Control
Barrier Function (PolyC2BF), which addresses overestimation and computational
complexity issues. The proposed PolyC2BF, formulated as a Quadratic Programming
(QP) problem, proves effective in facilitating collision-free movement of
multiple robots in complex environments. The efficacy of this approach is
further demonstrated through PyBullet simulations on quadruped (unicycle
model), and crazyflie 2.1 (quadrotor model) in cluttered environments.
comment: 6 Pages, 6 Figures. Accepted at European Control Conference (ECC)
2024. arXiv admin note: text overlap with arXiv:2303.15871
♻ ☆ Risk-aware Control for Robots with Non-Gaussian Belief Spaces
This paper addresses the problem of safety-critical control of autonomous
robots, considering the ubiquitous uncertainties arising from unmodeled
dynamics and noisy sensors. To take into account these uncertainties,
probabilistic state estimators are often deployed to obtain a belief over
possible states. Namely, Particle Filters (PFs) can handle arbitrary
non-Gaussian distributions in the robot's state. In this work, we define the
belief state and belief dynamics for continuous-discrete PFs and construct safe
sets in the underlying belief space. We design a controller that provably keeps
the robot's belief state within this safe set. As a result, we ensure that the
risk of the unknown robot's state violating a safety specification, such as
avoiding a dangerous area, is bounded. We provide an open-source implementation
as a ROS2 package and evaluate the solution in simulations and hardware
experiments involving high-dimensional belief spaces.
♻ ☆ Learning Quadruped Locomotion Using Differentiable Simulation
While most recent advancements in legged robot control have been driven by
model-free reinforcement learning, we explore the potential of differentiable
simulation. Differentiable simulation promises faster convergence and more
stable training by computing low-variant first-order gradients using the robot
model, but so far, its use for legged robot control has remained limited to
simulation. The main challenge with differentiable simulation lies in the
complex optimization landscape of robotic tasks due to discontinuities in
contact-rich environments, e.g., quadruped locomotion. This work proposes a
new, differentiable simulation framework to overcome these challenges. The key
idea involves decoupling the complex whole-body simulation, which may exhibit
discontinuities due to contact, into two separate continuous domains.
Subsequently, we align the robot state resulting from the simplified model with
a more precise, non-differentiable simulator to maintain sufficient simulation
accuracy. Our framework enables learning quadruped walking in minutes using a
single simulated robot without any parallelization. When augmented with GPU
parallelization, our approach allows the quadruped robot to master diverse
locomotion skills, including trot, pace, bound, and gallop, on challenging
terrains in minutes. Additionally, our policy achieves robust locomotion
performance in the real world zero-shot. To the best of our knowledge, this
work represents the first demonstration of using differentiable simulation for
controlling a real quadruped robot. This work provides several important
insights into using differentiable simulations for legged locomotion in the
real world.
♻ ☆ Non-smooth Control Barrier Functions for Stochastic Dynamical Systems
Uncertainties arising in various control systems, such as robots that are
subject to unknown disturbances or environmental variations, pose significant
challenges for ensuring system safety, such as collision avoidance. At the same
time, safety specifications are getting more and more complex, e.g., by
composing multiple safety objectives through Boolean operators resulting in
non-smooth descriptions of safe sets. Control Barrier Functions (CBFs) have
emerged as a control technique to provably guarantee system safety. In most
settings, they rely on an assumption of having deterministic dynamics and
smooth safe sets. This paper relaxes these two assumptions by extending CBFs to
encompass control systems with stochastic dynamics and safe sets defined by
non-smooth functions. By explicitly considering the stochastic nature of system
dynamics and accommodating complex safety specifications, our method enables
the design of safe control strategies in uncertain and complex systems. We
provide formal guarantees on the safety of the system by leveraging the
theoretical foundations of stochastic CBFs and non-smooth safe sets. Numerical
simulations demonstrate the effectiveness of the approach in various scenarios.
♻ ☆ MMP++: Motion Manifold Primitives with Parametric Curve Models
Motion Manifold Primitives (MMP), a manifold-based approach for encoding
basic motion skills, can produce diverse trajectories, enabling the system to
adapt to unseen constraints. Nonetheless, we argue that current MMP models lack
crucial functionalities of movement primitives, such as temporal and via-points
modulation, found in traditional approaches. This shortfall primarily stems
from MMP's reliance on discrete-time trajectories. To overcome these
limitations, we introduce Motion Manifold Primitives++ (MMP++), a new model
that integrates the strengths of both MMP and traditional methods by
incorporating parametric curve representations into the MMP framework.
Furthermore, we identify a significant challenge with MMP++: performance
degradation due to geometric distortions in the latent space, meaning that
similar motions are not closely positioned. To address this, Isometric Motion
Manifold Primitives++ (IMMP++) is proposed to ensure the latent space
accurately preserves the manifold's geometry. Our experimental results across
various applications, including 2-DoF planar motions, 7-DoF robot arm motions,
and SE(3) trajectory planning, show that MMP++ and IMMP++ outperform existing
methods in trajectory generation tasks, achieving substantial improvements in
some cases. Moreover, they enable the modulation of latent coordinates and
via-points, thereby allowing efficient online adaptation to dynamic
environments.
comment: 12 pages. This work has been submitted to the IEEE for possible
publication
♻ ☆ RoboDuet: A Framework Affording Mobile-Manipulation and Cross-Embodiment
Guoping Pan, Qingwei Ben, Zhecheng Yuan, Guangqi Jiang, Yandong Ji, Jiangmiao Pang, Houde Liu, Huazhe Xu
Combining the mobility of legged robots with the manipulation skills of arms
has the potential to significantly expand the operational range and enhance the
capabilities of robotic systems in performing various mobile manipulation
tasks. Existing approaches are confined to imprecise six degrees of freedom
(DoF) manipulation and possess a limited arm workspace. In this paper, we
propose a novel framework, RoboDuet, which employs two collaborative policies
to realize locomotion and manipulation simultaneously, achieving whole-body
control through interactions between each other. Surprisingly, going beyond the
large-range pose tracking, we find that the two-policy framework may enable
cross-embodiment deployment such as using different quadrupedal robots or other
arms. Our experiments demonstrate that the policies trained through RoboDuet
can accomplish stable gaits, agile 6D end-effector pose tracking, and zero-shot
exchange of legged robots, and can be deployed in the real world to perform
various mobile manipulation tasks. Our project page with demo videos is at
https://locomanip-duet.github.io .
♻ ☆ Nigel -- Mechatronic Design and Robust Sim2Real Control of an Over-Actuated Autonomous Vehicle
Simulation to reality (sim2real) transfer from a dynamics and controls
perspective usually involves re-tuning or adapting the designed algorithms to
suit real-world operating conditions, which often violates the performance
guarantees established originally. This work presents a generalizable framework
for achieving reliable sim2real transfer of autonomy-oriented control systems
using multi-model multi-objective robust optimal control synthesis, which lends
well to uncertainty handling and disturbance rejection with theoretical
guarantees. Particularly, this work is centered around a novel
actuation-redundant scaled autonomous vehicle called Nigel, with independent
all-wheel drive and independent all-wheel steering architecture, whose enhanced
configuration space bodes well for robust control applications. To this end, we
present the mechatronic design, dynamics modeling, parameter identification,
and robust stabilizing as well as tracking control of Nigel using the proposed
framework, with exhaustive experimentation and benchmarking in simulation as
well as real-world settings.
♻ ☆ PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
We present a new interaction mechanism of prediction and planning for
end-to-end autonomous driving, called PPAD (Iterative Interaction of Prediction
and Planning Autonomous Driving), which considers the timestep-wise interaction
to better integrate prediction and planning. An ego vehicle performs motion
planning at each timestep based on the trajectory prediction of surrounding
agents (e.g., vehicles and pedestrians) and its local road conditions. Unlike
existing end-to-end autonomous driving frameworks, PPAD models the interactions
among ego, agents, and the dynamic environment in an autoregressive manner by
interleaving the Prediction and Planning processes at every timestep, instead
of a single sequential process of prediction followed by planning.
Specifically, we design ego-to-agent, ego-to-map, and ego-to-BEV interaction
mechanisms with hierarchical dynamic key objects attention to better model the
interactions. The experiments on the nuScenes benchmark show that our approach
outperforms state-of-the-art methods.
♻ ☆ SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System
Accuracy and computational efficiency are the most important metrics to
Visual Inertial Navigation System (VINS). The existing VINS algorithms with
either high accuracy or low computational complexity, are difficult to provide
the high precision localization in resource-constrained devices. To this end,
we propose a novel filter-based VINS framework named SchurVINS, which could
guarantee both high accuracy by building a complete residual model and low
computational complexity with Schur complement. Technically, we first formulate
the full residual model where Gradient, Hessian and observation covariance are
explicitly modeled. Then Schur complement is employed to decompose the full
model into ego-motion residual model and landmark residual model. Finally,
Extended Kalman Filter (EKF) update is implemented in these two models with
high efficiency. Experiments on EuRoC and TUM-VI datasets show that our method
notably outperforms state-of-the-art (SOTA) methods in both accuracy and
computational complexity. The experimental code of SchurVINS is available at
https://github.com/bytedance/SchurVINS.
♻ ☆ DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions
Sanket Kalwar, Mihir Ungarala, Shruti Jain, Aaron Monis, Krishna Reddy Konda, Sourav Garg, K Madhava Krishna
Semantic segmentation in adverse weather scenarios is a critical task for
autonomous driving systems. While foundation models have shown promise, the
need for specialized adaptors becomes evident for handling more challenging
scenarios. We introduce DiffPrompter, a novel differentiable visual and latent
prompting mechanism aimed at expanding the learning capabilities of existing
adaptors in foundation models. Our proposed $\nabla$HFC image processing block
excels particularly in adverse weather conditions, where conventional methods
often fall short. Furthermore, we investigate the advantages of jointly
training visual and latent prompts, demonstrating that this combined approach
significantly enhances performance in out-of-distribution scenarios. Our
differentiable visual prompts leverage parallel and series architectures to
generate prompts, effectively improving object segmentation tasks in adverse
conditions. Through a comprehensive series of experiments and evaluations, we
provide empirical evidence to support the efficacy of our approach. Project
page at https://diffprompter.github.io.
♻ ☆ Leveraging Symmetry in RL-based Legged Locomotion Control
Zhi Su, Xiaoyu Huang, Daniel Ordoñez-Apraez, Yunfei Li, Zhongyu Li, Qiayuan Liao, Giulio Turrisi, Massimiliano Pontil, Claudio Semini, Yi Wu, Koushil Sreenath
Model-free reinforcement learning is a promising approach for autonomously
solving challenging robotics control problems, but faces exploration difficulty
without information of the robot's kinematics and dynamics morphology. The
under-exploration of multiple modalities with symmetric states leads to
behaviors that are often unnatural and sub-optimal. This issue becomes
particularly pronounced in the context of robotic systems with morphological
symmetries, such as legged robots for which the resulting asymmetric and
aperiodic behaviors compromise performance, robustness, and transferability to
real hardware. To mitigate this challenge, we can leverage symmetry to guide
and improve the exploration in policy learning via equivariance/invariance
constraints. In this paper, we investigate the efficacy of two approaches to
incorporate symmetry: modifying the network architectures to be strictly
equivariant/invariant, and leveraging data augmentation to approximate
equivariant/invariant actor-critics. We implement the methods on challenging
loco-manipulation and bipedal locomotion tasks and compare with an
unconstrained baseline. We find that the strictly equivariant policy
consistently outperforms other methods in sample efficiency and task
performance in simulation. In addition, symmetry-incorporated approaches
exhibit better gait quality, higher robustness and can be deployed zero-shot in
real-world experiments.
♻ ☆ Optimal Sensor Deception to Deviate from an Allowed Itinerary
In this work, we study a class of deception planning problems in which an
agent aims to alter a security monitoring system's sensor readings so as to
disguise its adversarial itinerary as an allowed itinerary in the environment.
The adversarial itinerary set and allowed itinerary set are captured by regular
languages. To deviate without being detected, we investigate whether there
exists a strategy for the agent to alter the sensor readings, with a minimal
cost, such that for any of those paths it takes, the system thinks the agent
took a path within the allowed itinerary. Our formulation assumes an offline
sensor alteration where the agent determines the sensor alteration strategy and
implement it, and then carry out any path in its deviation itinerary. We prove
that the problem of solving the optimal sensor alteration is NP-hard, by a
reduction from the directed multi-cut problem. Further, we present an exact
algorithm based on integer linear programming and demonstrate the correctness
and the efficacy of the algorithm in case studies.